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Abstract — Speech coders are very important devices in 
mobile communication. They determine the recovered speech 
and the capacity of the system. The original coder are used in the 
European digital cellular standard GSM rather by a 
grand-noise name of regular pulse excited long-term predication 
(RPE-LTP) codec. This codec has a net bit rate of 13 Kbps and 
was chosen after conducting exhausting subjective tests on 
various competing codec. The GSM Codec is relatively complex 
and power hungry. For that reason the Adaptive Multirate 
(AMR) codec is usually used in the GSM system. It is a more 
comprised natural codecs which produce lower bit rates and 
toll-quality speeches compared to other coder. These coders are 
multi-rate ACELP coders with 8 modes, operating at bit rates 
from 12.2 Kbps down to 4.75 Kbps. So in this article, we discuss 
about different types of parameters of the AMR codec, which 
make the GSM system more efficient. 

Index Terms — LPC, AMR codec 

I. Introduction 

Basically one can differ between the classification of lossless 
coding methods and lossy coding methods. In lossless a 
reconstruction of the speech signal is possible by regulating 
the decoder and gaining the same shape as the input speech 
signal. In the lossy coding the reconstructed speech signal is 
differs from the original speech signal waveform L 1 8] [12]. 
Most of the speech coding techniques are based on the lossy 
coding techniques, in which irrelevant information is removed. 
In mobile communication systems, the design and subjective 
test of speech has been extremely difficult. The goal of all 
speech coding systems is to transmit speech with the highest 
possible equality using the least possible channel capacity. 
The hierarchy of speech coders is shown in Figure 1.1. 

• Attributes of speech coders 

Speech coding either enhances the quality of a speech 
signal at a particular bit-rate or minimizes the bit-rate at a 
given quality. 

There are the following different properties for speech coders: 

• Low bit-rate 

• High speech quality 

• Robustness to different speakers/languages 

• Channel errors 

• Low memory requirements 

• Less computational complexity 

• Low coding delay 
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Figure 1.1 Hierarchy of speech coders [18] 


II. Classification by Coding Technique 

Speech coders differ widely in their approaches to 
achieving signal compression. Based on the means by which 
they achieve compression, speech coders are broadly 
classified into four categories [10] [12]. 

> Waveform coders 

> Vocoders 

> Parametric coders 

> Hybrid coders 

Waveform coders essentially try to reproduce the time 
waveform of the speech signal are closely as possible. They 
are, in principle, designed to be source independent and can 
hence code equally well a variety of signals [18] [4]. They 
have the advantage of being robust for a wide range of speech 
characteristics and for noisy environments. 

Vocoder is an analysis/synthesis system, used to reproduce 
human speech. The vocoder was originally developed as a 
speech coder for telecommunications applications in the 
1930s, the idea being to code speech for transmission. 
Transmitting the parameters of a speech model instead of a 
digitized representation of the speech waveform saves 
bandwidth in the communication channel; the parameters of 
the model change relatively slowly, compared to the changes 
in the speech waveform that they describe. Its primary use in 
this fashion is for secure radio communication, where voice 
has to be encrypted and then transmitted. 

A variety of different forms of audio codec or vocoder are 
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available for general use, and the GSM system supports a 
number of specific audio codecs. These include the RPE-LPC, 
half rate, and AMR codecs. The performance of each voice 
codec is different and they may be used under different 
conditions, although the AMR codec is now the most widely 
used [5, 6] . Also the newer AMR wideband (AMR-WB) codec 
is being introduced into many areas, including GSM. 

In parametric coders the speech signal is assumed to be 
generated from a model controlled by some speech 
parameters. In these coders the speech signal is modeled using 
a limited number of parameters corresponding to the speech 
production mechanism. These parameters are obtained by 
analyzing the speech signal before transmission [2, 19] . 

Hybrid coders try to fill the gap between waveform coders 
and parametric coders. Hybrid coders operate at medium 
bit-rates between those of waveform coders and parametric 
coders and produce high quality speech than parametric 
coders. An example of hybrid coder is the Code Excited 
Linear Predictive (CELP) coders. 


III. Frequency Domain Coding of Speech 

Frequency domain coders are a class of speech coders which 
take advantage of speech perception and generation models 
without making the algorithm totally dependent on the models 
used. 

The most common types of frequency domain coding include 
(i) Sub-band coding (SBC) (ii) Block transfer coding (BTC). 
Sub -band coding: - It can be explain the method of 
controlling and distributing quantization noise across the 
signal spectrum. Quantization is a non linear operation which 
produces noise products that are typically broad in spectrum. 
The human ear does not detect the quantization distortion 
products at all frequencies equally well [9] [18] [12]. 

In sub-band coder, speech is typically divided into four or 
eight sub bands by a bank of filters, and each sub band is 
sampled at a band-pass Nyquist rate and encoded with 
different accuracy in accordance to a perceptual criteria. One 
partitioning of speech band according to this method as 
suggested by Crochiere et al [12, 15] is given below 


Table 1.1: Speech band Partitioning 


Sub-band Number 

Frequency range 

1 

200-700 Hz 

2 

700-1310 Hz 

3 

1310-2020 Hz 

4 

2020-3200 Hz 


Sub band coding can be used for coding speech at bit rates in 
the range of 9.6 Kbps to 32 Kbps. In this range speech quality 
is roughly equivalent to that of ADPCM at an equivalent bit 
rate. Its complexity and relative speech quality at low bit rates 
make it particularly advantageous for coding below about 16 
Kbps. The CD-900 cellular telephone system uses sub-band 
coding for speech compression. The block diagram of 
sub-band coder and decoder as shown in Figure 1.3 below. 
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< > 
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Figure 1.2: Block diagram of sub-band (a) encoder 
(b) decoder [18] [19] 

Adaptive Transform Coding (ATC) is another frequency 
domain technique that has been successfully used to encode 
speech at bit rates in the range 9.6 Kbps to 20 Kbps. This a 
more complex technique which involves block transformation 
of windowed input segments of the speech waveform. Each 
segment is represented by a set of transform coefficients, 
which are separately quantized and transmitted. At the 
receiver the quantized coefficient are inverse transformed to 
produce a replica of the original input segment. 

One the most attractive and frequently used transforms for 
speech coding is the discrete cosine transform (DCT). The 
DCT of a N-point sequence x(n) is defined as [18, 8] 

Xc(k') = Jc(«}fc(fe) cos — 

k= 0, 1, 2...N-1 (1.1) 

Where h(0) = 1 and g(k) = V2 , k= 1, 2,...., N-l. The inverse 
DCT is defined as: 

jV — i 

jc(n) = — / Xc (k')h(k') co s 

N Z_f 

ft = 0 


(2 n + l)kn 
2 N 


n = 0, 1, 2, ..N— 1 


+ l}kn I 
2N 


( 1 . 2 ) 

In practical situations the DCT and IDCT are not evaluate 
directly using the above equation developed for computing 
the DCT in a computationally efficient manner are used. 
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IV. Vocoders 

Vocoders are class of speech coding system that analyze the 
voice signal at the transmitter, transmit parameters derived 
from the analysis, then synthesize the voice at the receiver 
using those parameters. Vocoders or speech codecs are used 
within many areas of voice communications. Obviously the 
focus here is on GSM audio codecs or vocoders, but the same 
principles apply to any form of codec [7] [6]. 

Audio codecs or vocoders are universally used within the 
GSM system. They reduce the bit rate of speech that has been 
converted from its analogue for into a digital format to enable 
it to be carried within the available bandwidth for the channel. 
Without the use of a speech codec, the digitized speech would 
occupy a much wider bandwidth then would be available. 
Accordingly GSM codecs are a particularly important 
element in the overall system [14] [17]. Vocoders are, in 
general, much more complex than the waveform coders and 
achieve very high economy in transmission bit rate. However 
they are less robust and their performance tends to be talker 
dependent. The most popular among the vocoding systems is 
the linear prediction coder (LPC). Figure 1.4 shows the 
traditional speech generation model that is the basis of all 
vocoding system. 


s 



Transmission 


speech 

output 


Figure 1.3: Speech Generation Model 


V. DIFFERENT TYPES OF CODERS 


A. Liner Predictive Coders 


LPC Vocoders- LPC belong to the time domain class of 
vocoders. This class of vocoders attempts to extract the 
significant features of speech from the time waveform. With 
LPC, it is possible to transmit good quality voice at 4.8 Kbps 
and poor quality at even lower rates. 

The linear predictive coding system models the vocal tract as 
an all pole linear filter with a described by [18, 19] : 




G 


1 + Ef =1 




Where G is a gain of the filter and z' 1 represents a unit delay 
operation. The prediction principles used are similar to those 
in ADPCM coders. 

However instead of transmitting quantized values of the error 
signal representing the difference between the predicted and 
actual waveform, the LPC system transmits only selected 
characteristics of the error signal. The parameters include the 
gain factor, pitch information, and Voice and unvoiced 


decision information, which allow approximation of the 
correct error signal. At the receiver, the received information 
about the error signal is used to determine the appropriate 
excitation for the synthesis filter. 



This technique requires that the transmitter extract pitch 
frequency information which is often very difficult. More ever 
the phase coherence between the harmonic components of the 
excitation pulse tends to produce a buzzy beats in the 
synthesized speech. These problems minimized in other two 
methods. 

> Multipulse Excited LPC 

> Code-Excited LPC 

In multi-pulse LPC no matter how well the pulse is 
positioned, excitation by a single pulse per pitch period 
produces audible distortion. Therefore using more than one 
pulse typically eight per period, and adjusting the individual 
pulse positions and amplitude sequentially to minimize a 
spectrally weighted mean square error. This technique is 
called the multiple excited LPC (MPE-LPC) and result in 
better speech quality, not only because the prediction error is 
better approximated by several pulses per pitch period, but 
also because the multipulse algorithm does not require pitch 
detection. The number of pulse can be reduced. A variety of 
different codec methodologies are used for GSM codecs. 

Code- Excited LPC, the coder and decoder have 
predetermined book of stochastic excitation signals. For each 
speech signal the transmitter searches through its code book 
of stochastic signal for the one that gives the best perceptual 
match to the sound when used as an excitation to the LPC 
filter [17] [18] [19]. The code excited LPC (CELPC) coders 
are extremely complex and can require more than 500 million 
multiply and add operations per second. They can provide 
high quality even when the excitation is coded at only 0.25 
bits per sample. These coders can achieve transmission bit 
rates as low as 4.8 kbps. The main principle behind the CELP 
codec is that is uses a principle known as "Analysis by 
Synthesis". In this process, the encoding is performed by 
perceptually optimizing the decoded signal in a closed loop 
system. One way in which this could be achieved is to 
compare a variety of generated bit streams and choose the one 
that produces the best sounding signal. 

VSELP (Vector Sum Excitation Linear Prediction) codec 
The Vector Sum Excitation Linear Prediction codec one of 
the major drawbacks of the VSELP codec is its limited ability 
to code non-speech sounds. This means that it performs 
poorly in the presence of noise. As a result this voice codec is 
not now as widely used, other newer speech codecs being 
preferred and offering far superior performance. Figure 2.3 
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shows the method for selecting the minimum excitation 
signal. The procedure is best illustrated through an example. 


Original Speech 



Figure 1.5: Block diagram illustrating the ACCELP 
Codec [17] [18] 

Consider the coding of a short 4 ms block of speech signal. At 
a sampling frequency of 5 KHz, each block consists of 20 
speech samples. A bit rate of Vi bit per sample corresponds to 
10 bits per block. Therefore, there are 2 10 = 1024 possible 
sequences of length 40 for each block. 

Residual (Error) Excited LPC The rationale behind the 
residual excited LPC (RELP) is related to that of the DPCM 
technique in waveform coding. In this class of LPC coder, 
after estimating the model parameters (LP coefficients or 
related parameters) and excitation parameter 
(voiced/unvoiced decision, pitch, gain) from a speech frame, 
the speech is synthesized at the transmitter and subtracted 
from the original speech signal to from a residual signal. The 
residual signal is quantized, coded, and transmitted to the 
receiver along with the LPC model parameters. At the 
receiver the residual error signal is added to the signal 
generated using the model parameters to synthesize an 
approximation of the original speech signal. The quality of the 
synthesized speech is improved due to the addition of the 
residual error. Ligure 2.4 shows a block diagram of a simple 
RELP codec. 



Figure 1.6: Block diagram of RELP encoder [18] 


Table 1.2: Speech Coders used in various first and 
second generation wireless systems [8, 10] 


Standar 

d 

Service 

type 

Speech Coder Type Used 


Bit Rate 

(Kbps) 

GSM 

Cellular 

RPE-LTP 

13 

CD-900 

Cellular 

SBC 

16 

USDC 

(IS-54) 

Cellular 

VSELP 

8 

IS-95 

Cellular 

CELP 

1.2, 2.4, 4.8, 9.6 

IS-95 

PCS 

PCS 

CELP 

14.4 

PDC 

Cellular 

VSELP 

4.5, 6.7, 11.2 

CT2 

Cordless 

ADPCM 

32 

DECT 

Cordless 

ADPCM 

32 

PHS 

Cordless 

ADPCM 

32 

DCS-180 

0 

PCS 

RPE-LTP 

13 

PACS 

PCS 

ADPCM 

32 


VI. The GSM Codec 

The Original speech coder used in the pan-European digital 
cellular standard GSM goes by rather most popular name of 
regular pulse excited long term prediction (RPE-LTP) codec. 
This codec has bit rate of 13 Kbps and was chosen after 
conducting exhaustive subjective tests on various competing 
codecs. More recent GSM upgrades have improved upon the 
original codec specification. The RPE-LTP codec combines 
the advantages of the earlier French proposed base band 
RELP codec with those of the multi-pulse excited long term 
prediction (MPE-LTP) codec proposed by Germany. The 
most advantage of the base band RELP codec is that it 
provides good quality speech at low complexity. The speech 
quality of a RELP codec is however, limited due to tonal noise 
introduced by the process of high frequency regeneration and 
by the bit errors introduced during transmission. The 
MPE-LTP technique produces best speech or excellent 
speech quality at high complexity and is not much affected by 
bit errors in the channel. By modifying or enhancing the 
RELP codec to incorporate certain features of the MPE-LTP 
codec, the net bit rate was reduced from 1.477 Kbps to 13.0 
Kbps without loss of quality. The most important 
modification was the additional of a long term prediction 
loop. 

The GSM codec is relatively complex and power hungry. 
The Figure 2.5 shows a block diagram of the speech encoder. 
The encoder is comprised of four major processing blocks. 
The speech sequence is first pre-emphasized, ordered in to 
segment of 20 ms duration and then Hamming- windowed. 
This is followed by a short tern prediction (STP) filtering 
analysis where the logarithmic area ration (LARs) of the 
reflection coefficient r n (k) (eight in number) are computed. 
The eight LAR parameters have different dynamic ranges and 
probability distribution functions, and hence all of them are 
not encoded with the same number of bits for 
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transmission. The LAR parameters are also decoded by the 
LPC inverse filter so as to minimize the error e n . 


LTP analysis which involves finding the pitch period p n and 
gain factor g n is then carried out such that LTP residual r n is 
minimized. To minimize r n , pitch extraction is done by the 
LTP by determining that value of delay, D, which maximizes 
the cross correlation between the current STP error sample, 
e n , and a previous error sample, e n _ D . The extracted pitch p n 
and gain g n are transmitted and encoded at a rate of 3.6 Kbps. 
The LTP residual, r n , is weighted and decomposed into three 
candidate excitation sequences. The energies of these 
sequences are pulse indentified, and the one with the highest 
energy is selected to represent the LTP residual [12] [18]. The 
pulse in the excitation sequence are normalized to the highest 
amplitude, quantized, and transmitted at a rate of 9.6 Kbps. 

Figure 2.6 shows a block diagram of the GSM speech 
decoder. It consists of four blocks which perform operations 
complementary to those of the encoder. The received 
excitation parameters are RPE decoded and passed to the LTP 
synthesis filter which uses the pitch and gain parameter to 
synthesis the long term signal. Short term synthesis is carried 
out using the received reflection coefficient to recreate the 
original speech signal. 

Every 260 bits of the coder i.e. 20 ms blocks of speech are 
ordered depending on their importance, into groups of 50, 
132, and 78 bits each. The bits in the first group are very 
important bits called I a bits. The next 132 bits are important 
bits called I b bits [4] [5] [18] and the last 78 bits are called 
type II bits. The least type II bits have no error correction or 
detection. 



Figure 1.8: Block diagram of GSM speech decoder [18] 


VII. GSM AUDIO CODECS / VOCODERS 

A variety of GSM audio codecs / vocoders are supported. 
These have been introduced at different times, and have 
different levels of performance. Although some of the early 
audio codecs are not as widely used these days, they are still 
described here as they form part of the GSM system. 


Table 2.2 b: Comparison of different technologies 


CODEC 

NAME 

BIT RATE 
(Kbps) 

COMPRESSION 

TECHNOLOGY 

Full rate 

13 

RTE-LPC 

EFR 

12.2 

ACELP 

Half rate 

5.6 

VSELP 

AMR 

12.2-4.75 

ACELP 

AMR-WB 

23.85-6.60 

ACELP 


VIII. GSM AMR Codec 

The AMR, Adaptive Multi-rate codec is now the most widely 
used GSM codec. The AMR codec was adopted by 3 GPP in 
October 1988 and it is used for both GSM and circuit 
switched UMTS / WCDMA voice calls. 

The AMR codec provides a variety of options for one of eight 
different bit rates as described in the table below. The bit rates 
are based on frames that are 20 milliseconds long and contain 
160 samples. The AMR codec uses a variety of different 
techniques to provide the data compression [16, 18J . The 
ACELP codec is used as the basis of the overall speech codec, 
but other techniques are used in addition to this. 
Discontinuous transmission is employed so that when there is 
no speech activity the transmission is cut. Additionally Voice 
Activity Detection (VAD) is used to indicate when there is 
only background noise and no speech. Additionally to provide 
the feedback for the user that the connection is still present, a 
Comfort Noise Generator (CNG) is used to provide some 
background noise, even when no speech data is being 
transmitted. This is added locally at the receiver. 

The use of the AMR codec also requires that optimized link 
adaptation is used so that the optimum data rate is selected to 
meet the requirements of the current radio channel conditions 
including its signal to noise ratio and capacity. This is 
achieved by reducing the source coding and increasing the 
channel coding. Although there is a reduction in voice clarity, 
the network connection is more robust and the link is 
maintained without dropout. Improvement levels of between 
4 and 6 dB may be experienced [11] [14]. However network 
operators are able to prioritize each station for either quality 
or capacity. The AMR codec has a total of eight rates: eight 
are available at full rate (FR), while six are available at half 
rate (HR). This gives a total of fourteen different modes. 
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Table 2.3: AMR codec data rates 


MODE 

BIT 

RATE 

(KBPS) 

FULL RATE 
(FR) / HALF 
RATE (HR) 

AMR 12.2 

12.2 

FR 

AMR 10.2 

10.2 

FR 

AMR 7.95 

7.95 

FR/HR 

AMR 7.40 

7.40 

FR/HR 

AMR 6.70 

6.70 

FR/HR 

AMR 5.90 

5.90 

FR/HR 

AMR 5.15 

5.15 

FR/HR 

AMR 4.75 

4.75 

FR/HR 


AMR-WB codec - Adaptive Multi-Rate Wideband, 
AMR-WB codec, also known under its ITU designation of 
G.722.2, is based on the earlier popular Adaptive Multi-Rate, 
AMR codec. AMR-WB also uses an ACELP basis for its 
operation, but it has been further developed and AMR-WB 
provides improved speech quality as a result of the wider 
speech bandwidth that it encodes. AMR-WB has a bandwidth 
extending from 50 - 7000 Hz which is significantly wider than 
the 300 - 3400 Hz bandwidths used by standard telephones. 
However this comes at the cost of additional processing, but 
with advances in IC technology in recent years, this is 
perfectly acceptable [9] [11] [ 14]. 

The AMR-WB codec contains a number of functional 
areas: it primarily includes a set of fixed rate speech and 
channel codec modes. It also includes other codec functions 
including: a Voice Activity Detector (VAD); Discontinuous 
Transmission (DTX) functionality for GSM; and Source 
Controlled Rate (SCR) functionality for UMTS applications 
[5] [7]. Further functionality includes in-band signaling for 
codec mode transmission, and link adaptation for control of 
the mode selection. The AMR-WB codec has a 16 KHz 
sampling rate and the coding is performed in blocks of 20 ms. 
there are two frequency bands that are used: 50-6400 Hz and 
6400-7000 Hz. These are coded separately to reduce the 
codec complexity. This split also serves to focus the bit 
allocation into the subjectively most important frequency 
range. 

The lower frequency band uses an ACELP codec algorithm, 
although a number of additional features have been included 
to improve the subjective quality of the audio. Linear 
prediction analysis is performed once per 20 ms frame. Also, 
fixed and adaptive excitation codebooks are searched every 5 
ms for optimal codec parameter values [1, 3J . 

IX. AMR BASIC OPERATION 

Figure 2.9 shows that there are fixed rate speech and channel 
codecs. Through a different level of error protection a 
different distribution of the available gross bit-rate between 
speech and channel coding is provided by each codec mode. 
The AMR codec contains a set of fixed rate speech and 
channel codecs, in-band signaling and link adaptation [16, 19] . 

Figure 2.9 shows a basic block diagram of the AMR codec 
in GSM. Each codec mode provides a different level of error 
protection through a different distribution of the available 
gross bit-rate between speech and channel coding. The link 
adaptation process bears responsibility for measuring the 
channel quality and selecting the optimal speech and channel 


codecs. In-band signaling transmits the measured channel 
quality and codec mode information over the air interface. 
The in-band signaling is transmitted along with the speech 
data. The Mobile Station (MS) and the Base Transceiver 
Station (BTS) both perform channel quality estimation for the 
receive signal path. Based on the channel quality 
measurements, a Codec Mode Command (over downlink to 
the MS) or Codec Mode Request (over uplink to network) is 
sent in-band over the air interface [15] [16]. 

The AMR codec contains a set of fixed rate speech and 
channel codecs, in-band signaling and link adaptation. Figure 
1.9 shows a basic block diagram of the AMR codec in GSM. 


MOBILE 



Figure 2.8: Simplified Block diagram of AMR speech 
coder 

X. Conclusion 

Above include the study on speech coder and vocoders. It also 
introduces and describes different types of codec technologies 
that are used in the GSM. 

The frequency in the GSM technology is limited. Therefore it 
is essential to save frequency, power, and to increase the 
channel capacity. The codecs are an important factor in GSM, 
when it comes to a voice, which is to be transferred or 
recorded in a particular format because the voice or speech 
can also take the much frequency and power also. So above 
study the different types of coders which are used in GSM in 
that the AMR codec is very natural codec which having the 
low voice quality but taken less bandwidth and to improve the 
voice quality the AMRWB are used and very efficient 
manner. 
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